Improving Named Entity Extraction Accuracy using Unlabeled Data and Several Extractors (pp. 29-38)

نویسندگان

  • Tomoya Iwakura
  • Seishi Okamoto
چکیده

This paper proposes feature augmentation methods using unlabeled data and several Named Entity (NE) extractors. We collect NE-related information of each word (which we call NE-related labels) from unlabeled data by using NE extractors. NE-related labels which we collect include candidate NE class labels of each word and NE class labels of co-occurring words. To accurately collect the NE-related labels from unlabeled data, we consider methods to collect NE-related labels by using outputs of several NE extractors. We use NE-related labels as additional features for creating new NE extractors. We apply our NE extraction methods using the NE-related labels to IREX Japanese NE extraction task. The experimental results show better accuracy than the previous results obtained with NE extractors using handcrafted resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern-based Aggregation of Named Entity Extractors

Despite significant advances in named entity extraction technologies, state-of-the-art extraction tools achieve insufficient accuracy rates for practical use in many operational settings. However, they are not all prone to the same types of error, suggesting that substantial improvements may be achieved via appropriate combinations of existing tools, provided their behavior can be accurately ch...

متن کامل

Semi-supervised Relation Extraction using EM Algorithm

Relation Extraction is the task of identifying relation between entities in a natural language sentence. We propose a semisupervised approach for relation extraction based on EM algorithm, which uses few relation labeled seed examples and a large number of unlabeled examples (but labeled with entities). We present analysis of how unlabeled data helps in improving the overall accuracy compared t...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

Bayesian Model Averaging of Named Entity Extraction Algorithms

Automatic information extraction (IE) has emerged as a critical tool for commercial, industrial, and governmental applications that are confronted with an explosive growth of digital information. Within the framework of information extraction a hierarchy of objectives exists, many of which are heavily dependent upon the automatic recognition of people, places, and organizations—or, more specifi...

متن کامل

Teaching a Weaker Classifier: Named Entity Recognition on Upper Case Text

This paper describes how a machinelearning named entity recognizer (NER) on upper case text can be improved by using a mixed case NER and some unlabeled text. The mixed case NER can be used to tag some unlabeled mixed case text, which are then used as additional training material for the upper case NER. We show that this approach reduces the performance gap between the mixed case NER and the up...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Polibits

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2009